Weighted Finite-State Morphological Analysis of Finnish Inflection and Compounding

نویسندگان

  • Krister Lindén
  • Tommi Pirinen
چکیده

Finnish has a very productive compounding and a rich inflectional system, which causes ambiguity in the morphological segmentation of compounds made with finite state transducer methods. In order to disambiguate the compound segmentations, we compare three different strategies, which we cast in a probabilistic framework. We present a method for implementing the probabilistic framework as part of the building process of lexc-style morpheme sub-lexicons creating weighted lexical transducers. To implement the structurally disambiguating morphological analyzer, we use the HFSTLEXC tool which is part of the open source Helsinki Finite-State Technology. This is the first time all three principles are cast in a probabilistic framework and compared on the same corpus using one tool. On our Finnish test corpus, the best method succeeds with 99,98 % precision and recall. 1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weighted Finite-State Morphological Analysis of Finnish Compounding with HFST-LEXC

Finnish has a very productive compounding and a rich inflectional system, which causes ambiguity in the morphological segmentation of compounds made with finite state transducer methods. In order to disambiguate the compound segmentations, we compare three different strategies, which are all cast in the same probabilistic framework and compared for the first time. We present a method for implem...

متن کامل

Weighting Finite-State Morphological Analyzers using HFST Tools

In a language with very productive compounding and a rich inflectional system, e.g. Finnish, new words are to a large extent formed by compounding. In order to disambiguate between the possible compound segmentations, a probabilistic strategy has been found effective by Lindén and Pirinen [7]. In this article, we present a method for implementing the probabilistic framework as a separate proces...

متن کامل

Evaluation of Finite State Morphological Analyzers Based on Paradigm Extraction from Wiktionary

Wiktionary provides lexical information for an increasing number of languages, including morphological inflection tables. It is a good resource for automatically learning rule-based analysis of the inflectional morphology of a language. This paper performs an extensive evaluation of a method to extract generalized paradigms from morphological inflection tables, which can be converted to weighte...

متن کامل

Complexity, two-level morphology and Finnish

'll~e twoolevel model provides a language independent framework for describing phonological mid morphological phenomena associated with word inflection, derivation and compounding. The model can be expressed ill tenos of finiteostate machines, and it is easy to impliement. ]he model has, in fact, two aspects: (1) it is a linguistic formalism for describing phonological phenomena, and (2) it is ...

متن کامل

A Modular Approach to Turkish Noun Compounding: The Integration of a Finite-State Model

In this paper, we describe the design and integration of a three level cascaded non-deterministic finite state model of Turkish compounding into Turkish PAPPI, a comprehensive syntactic parser in the principles-andparameters(P&P) framework. Our approach is to handle compounding as an intermediate stage between morphological analysis and syntactic parsing. We discuss how the compounding machine ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009